Update Summarization using a Multi-level Hierarchical Dirichlet Process Model

نویسندگان

  • Jiwei Li
  • Sujian Li
  • Xun Wang
  • Ye Tian
  • Baobao Chang
چکیده

Update summarization is a new challenge which combines salience ranking with novelty detection. Previous researches usually convert novelty detection to the problem of redundancy removal or salience re-ranking, and seldom explore the birth, splitting, merging and death of aspects for a given topic. In this paper, we borrow the idea of evolutionary clustering and propose a three-level HDP model named h-uHDP, which reveals the diversity and commonality between aspects discovered from two different epochs (i.e. epoch history and epoch update). Specifically, we strengthen modeling the sentence level in the h-uHDP model to adapt to the sentence extraction based framework. Automatic and manual evaluations on TAC data demonstrate the effectiveness of our update summarization algorithm, especially from the novelty criterion.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Information Extraction Systems of PRIS at Temporal Summarization Track

This paper describes the information extraction systems of PRIS at Temporal Summarization Track. The Temporal Summarization Track includes two tasks: sequential update summarization and value tracking. For the first task, we focus attention on keywords mining and sentence scoring. The system utilizes hierarchical Latent Dirichlet Allocation (LDA) to do keywords mining and score sentences with k...

متن کامل

Multilingual Multi-document Summarization with Enhanced hLDA Features

This paper presents the state of art research progress on multilingual multi-document summarization. Our method utilizes hLDA (hierarchical Latent Dirichlet Allocation) algorithm to model the documents firstly. A new feature is proposed from the hLDA modeling results, which can reflect semantic information to some extent. Then it combines this new feature with different other features to perfor...

متن کامل

The CIST Summarization System at TAC 2011

In this report, we present our extractive summarization system on both summarization and multiling tracks of TAC 2011. We introduce an extractive multi-document summarization method based on hierarchical topic model of hierarchical Latent Dirichlet Allocation (hLDA) and sentence compression. hLDA is a representative generative probabilistic model, which not only can mine latent topics from a la...

متن کامل

Bringing Summarization to End Users: Semantic Assistants for Integrating NLP Web Services and Desktop Clients

We present PathSum, a high-performing hierarchical-topic based singleand multi-document automatic text summarization framework. This approach leverages Bayesian nonparametric methods to model sentences as paths through a tree and create a hierarchy of topics from the input in an unsupervised setting. We describe the generative model used to learn a topic tree based on hierarchical latent Dirich...

متن کامل

Evolutionary Hierarchical Dirichlet Process for Timeline Summarization

Timeline summarization aims at generating concise summaries and giving readers a faster and better access to understand the evolution of news. It is a new challenge which combines salience ranking problem with novelty detection. Previous researches in this field seldom explore the evolutionary pattern of topics such as birth, splitting, merging, developing and death. In this paper, we develop a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012